neural information processing system 32
Copula Multi-label Learning
A formidable challenge in multi-label learning is to model the interdependencies between labels and features. Unfortunately, the statistical properties of existing multi-label dependency modelings are still not well understood. Copulas are a powerful tool for modeling dependence of multivariate data, and achieve great success in a wide range of applications, such as finance, econometrics and systems neuroscience. This inspires us to develop a novel copula multi-label learning paradigm for modeling label and feature dependencies. The copula based paradigm enables to reveal new statistical insights in multi-label learning. In particular, the paper first leverages the kernel trick to construct continuous distribution in the output space, and then estimates our proposed model semiparametrically where the copula is modeled parametrically, while the marginal distributions are modeled nonparametrically. Theoretically, we show that our estimator is an unbiased and consistent estimator and follows asymptotically a normal distribution. Moreover, we bound the mean squared error of estimator. The experimental results from various domains validate the superiority of our proposed approach.
Sample Adaptive MCMC
For MCMC methods like Metropolis-Hastings, tuning the proposal distribution is important in practice for effective sampling from the target distribution \pi. In this paper, we present Sample Adaptive MCMC (SA-MCMC), a MCMC method based on a reversible Markov chain for \pi^{\otimes N} that uses an adaptive proposal distribution based on the current state of N points and a sequential substitution procedure with one new likelihood evaluation per iteration and at most one updated point each iteration. The SA-MCMC proposal distribution automatically adapts within its parametric family to best approximate the target distribution, so in contrast to many existing MCMC methods, SA-MCMC does not require any tuning of the proposal distribution. Instead, SA-MCMC only requires specifying the initial state of N points, which can often be chosen a priori, thereby automating the entire sampling procedure with no tuning required. Experimental results demonstrate the fast adaptation and effective sampling of SA-MCMC.
Causal Regularization
We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.
Visual Sequence Learning in Hierarchical Prediction Networks and Primate Visual Cortex
In this paper we developed a computational hierarchical network model to understand the spatiotemporal sequence learning effects observed in the primate visual cortex. The model is a hierarchical recurrent neural model that learns to predict video sequences using the incoming video signals as teaching signals. The model performs fast feedforward analysis using a deep convolutional neural network with sparse convolution and feedback synthesis using a stack of LSTM modules. The network learns a representational hierarchy by minimizing its prediction errors of the incoming signals at each level of the hierarchy. We found that recurrent feedback in this network lead to the development of semantic cluster of global movement patterns in the population codes of the units at the lower levels of the hierarchy. These representations facilitate the learning of relationship among movement patterns, yielding state-of-the-art performance in long range video sequence predictions on benchmark datasets. Without further tuning, this model automatically exhibits the neurophysiological correlates of visual sequence memories that we observed in the early visual cortex of awake monkeys, suggesting the principle of self-supervised prediction learning might be relevant to understanding the cortical mechanisms of representational learning.
- North America > United States (0.28)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (5 more...)
- North America > United States > Michigan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Middle East > Jordan (0.04)